34  Patterns of Topography & Vegetation I

Learning Objectives

After completing this tutorial you should be able to

Download the directory for this project here, make sure the directory is unzipped and move it to your bi328 directory. You can open the Rproj for this module either by double clicking on it which will launch Rstudio or by opening Rstudio and then using File > Open Project or by clicking on the Rproject icon in the top right of your program window and selecting Open Project.

There should be a file named 34_vegetation-I.qmd in that project directory. Use that file to work through this tutorial - you will hand in your rendered (“knitted”) quarto file as your homework assignment. So, first thing in the YAML header, change the author to your name. You will use this quarto document to record your answers. Remember to use comments to annotate your code; at minimum you should have one comment per code set1 you may of course add as many comments as you need to be able to recall what you did. Similarly, take notes in the document as we discuss discussion/reflection questions but make sure that you go back and clean them up for “public consumption”.

  • 1 You should do this whether you are adding code yourself or using code from our manual, even if it isn’t commented in the manual… especially when the code is already included for you, add comments to describe how the function works/what it does as we introduce it during the participatory coding session so you can refer back to it.

  • Before you get started, let’s make sure to read in all the packages we will need - take a look and install any that you are still missing!

    # load libraries ----
    
    # reporting
    library(knitr)
    
    # visualization
    library(ggplot2)
    library(ggthemes)
    library(patchwork)
    library(viridis)
    
    # remote sensing & GIS
    library(sf)
    library(raster)
    library(ggspatial)
    library(rnaturalearth)
    library(rnaturalearthhires)
    library(rnaturalearthdata)
    
    # data wrangling
    library(plyr)
    library(dplyr)
    library(tidyr)
    library(tibble)
    library(readr)
    library(skimr)
    library(janitor)
    library(glue)
    library(magrittr)
    library(lubridate)
    
    # stats
    library(corrplot)
    library(Hmisc)
    
    # set other options ----
    options(scipen=999)
    
    knitr::opts_chunk$set(
      tidy = FALSE, 
      message = FALSE,
        warning = FALSE)

    34.1 NEON & the power of longterm data sets

    Consider this

    Watch this short introduction to the National Ecological Observatory Network (NEON) and read this overview of the field stations and domains within the network of monitoring stations.

    Briefly outline what NEON is, how it is designed, what data is being measured, what the central goal is, and how the design supports this mission.

    Discuss the value of long-term monitoring stations and standardized data gathering in light of various modules we have worked through this semesters and what you have learned about certain fields of ecological study.

    Did it!

    [Your answer here]

    34.2 Plot maps of topography and vegetation

    The data set we are going to start out with is from a NEON station in the Sierra Nevada mountains in California from a region called Soaproot Saddle.

    You already have several *.tif files in your data folder. These are raster data sets with information on NDVI and topography that were sampled for this NEON site in 2018.

    Consider this

    Briefly describe how a raster data set encodes information.

    Did it!

    [Your answer here]

    First we will load Digital Terrain Model data (DTM) which was obtained using LiDAR.

    dtm <- raster("data/NEON_D17_SOAP_DP3_298000_4100000_DTM.tif")
    
    class(dtm)
    [1] "RasterLayer"
    attr(,"package")
    [1] "raster"
    Consider this

    Briefly describe what data the terrain model raster data set contains.

    Did it!

    [Your answer here]

    Next, we will read in the Digital Surface Model LiDAR raster data set.

    dsm <- raster("data/NEON_D17_SOAP_DP3_298000_4100000_DSM.tif")
    Consider this

    Briefly describe what data the surface model raster data set contains.

    Did it!

    [Your answer here]

    We already have information on elevation, but let’s calculate slope and aspect as additional metrics to describe the topography.

    Consider this

    Define what slope and aspect are and how they can be used to describe the topography of a landscape.

    Did it!

    [Your answer here]

    We will use degrees as a measure for slope based on the DTM raster data set.

    slope <- terrain(dtm, opt = "slope", unit = "degrees", neighbors = 8)

    To determine aspect, we will calculate the “northness” as the cosine of aspect, which will read in radians2.

  • 2 Radian is the SI unit for measuring angles; it is a dimensionless value.

  • # calculate aspect
    aspect <- terrain(dtm, opt = "aspect", unit = "radians", neighbors = 8)
    
    # calculate northness
    north <- cos(aspect)

    Now we have everything we need to describe the topography of the monitoring site using elevation, slope, and aspect.

    Our overarching question is the extent to which topography impacts vegetation. We will use two variables to describe “levels of vegetation”, first the Normalized difference vegetation index (NDVI) and second vegetation height.

    Spectral imaging was used to determine the Normalized Difference vegetation index (NDVI) for our monitoring station.

    ndvi <- raster("data/NEON_D17_SOAP_DP3_298000_4100000_NDVI.tif")
    Consider this

    Describe what NDVI measures and how it can be used to assess vegetation.

    Did it!

    [Your answer here]

    We can calculate the vegetation height as DSM - DTM.

    veg_ht <- dsm - dtm

    Currently, we have several individual raster objects, each forming a different layer describing topography and vegetation for the same locations.

    We can combine all these individual layers into a raster stack. This produces a list in which each element is a different raster layer. Doing this has some advantages, for example, it is straight forward to make a single plot combining these different layers.

    # create raster stack
    all_data <- stack(dtm, dsm, slope, north, ndvi, veg_ht)
    
    # rename raster layers
    names(all_data) <- c("DTM", "DSM", "slope", "north", "NDVI", "Vegetation.Height")

    Now we can create maps for each of our raster files3. We will use the function plot() instead of our usual ggplot options to keep it simple.

  • 3 This may take a second to plot … you are processing a bunch of data!

  • plot(all_data, 
         col = viridis(255))

    Let’s consider what correlations we would expect to find based on these six maps.

    Consider this

    Compare and contrast these maps and describe your observations to argue whether overall you think that vegetation patterns are influenced by topography.

    Did it!

    [Your answer here]

    Consider this

    Argue which topographic metrics you think should have a strong association with NDVI and describe what you would expect this relationship to look like (e.g. how might they correlate?).

    Did it!

    [Your answer here]

    Consider this

    Argue which topographic metric you think should have a strong association with vegetation height and describe what you would expect that relationship to look like.

    Did it!

    [Your answer here]

    34.3 Explore relationships between topographic variable and vegetation patterns

    First, lets convert our raster layers into data frames for easier use. We’ll start by creating an empty data frame with as many rows as there are cells in our raster objects. Then we can use the function extract() pulling out the information for each pixel and put it into a set of columns using the extent argument, which tells R to pull out all the information in a specific raster layer and turn it into a vector4.

  • 4 Recall, each column of a data.frame is a vector

  • # create and empty data frame
    df <- as.data.frame(matrix(NA, nrow = ncell(dtm), ncol=0))
    
    # extract vegetation height
    df$veg_ht <- raster::extract(veg_ht, extent(veg_ht))
    
    # extract ndvi
    df$ndvi <- raster::extract(ndvi, extent(ndvi))
    
    # extract dtm
    df$dtm  <- raster::extract(dtm, extent(dtm))
    
    # extract slope
    df$slope  <- raster::extract(slope, extent(slope))
    
    # extract aspect
    df$north <- raster::extract(north, extent(north)) 
    
    head(df)
         veg_ht      ndvi     dtm slope north
    1 7.8100586 0.8477020 1297.25   NaN   NaN
    2 8.9599609 0.8142292 1297.36   NaN   NaN
    3 7.5100098 0.8118778 1297.39   NaN   NaN
    4 0.7000732 0.8314815 1297.35   NaN   NaN
    5 0.8100586 0.7325861 1297.32   NaN   NaN
    6 0.0000000 0.6941839 1297.20   NaN   NaN

    Note, that we no longer have spatial information (coordinates), however, values in the same row do correspond to the same pixel location in our raster object.

    Consider this

    Note, that some of the slope and north values are NaN - why do you think that is?

    Did it!

    [Your answer here]

    Let’s go ahead and remove all rows that contain a missing value for one parameter.

    df <- df %>%
      filter(!is.na(slope)) %>%
      filter(!is.na(north))
    
    head(df)
          veg_ht      ndvi     dtm    slope      north
    1 6.46008301 0.8252837 1297.20 11.99227 -0.8944847
    2 6.85998535 0.8504076 1297.03 18.01359 -0.9609535
    3 0.61999512 0.6705710 1297.00 17.02542 -0.9999917
    4 0.00000000 0.6165509 1296.93 14.48724 -0.9770750
    5 0.00000000 0.5876359 1296.89 14.64421 -0.9805627
    6 0.03991699 0.5609888 1296.90 18.33933 -0.8936710

    Because we have so much data, we will take a 1% subset of the data to decrease the computational power and time needed.

    Because our data set is so large (1 Million rows!), we would expect that a random subset is representative of the relationships as a whole.

    dplyr::slice_sample() can be used to specify the proportion of rows that you would like to retain. The function will return a random sub sample.

    # set seed for reproducibility
    set.seed(42)
    
    # randomly select 1% of rows
    df_sub <- df %>%
      slice_sample(prop = 0.01)

    Now, let’s take a look at the relationship between the vegetation and topographic variables.

    To do this efficiently using the tidyverse principles and some ggplot magic, need to pivot our data set. Ultimately, we want to have a data set with one column with our topography parameters, one with the topography measurements, one with vegetation variables and one with measurements; then we will be able to use facet_grid() to plot all combinations of variables.

    df_plot <- df_sub %>%
      pivot_longer(names_to = "topog_param", values_to = "topog_meas", 3:5) %>%
      pivot_longer(names_to = "veget_param", values_to = "veget_meas", 1:2)
    
    ggplot(df_plot, aes(x = topog_meas, y = veget_meas)) +
      geom_point(alpha = 0.25, size = .75) +
      facet_grid(veget_param ~ topog_param, scales = "free") +
      labs(x = "Topography", y = "Vegetation") +
      theme_facet

    Figure 34.1: Relationship of topography and vegetation. Topography is described as elevation (dtm, in meters), northness (radians), and slope (degrees); vegetation is assessed using NDVI and vegetation height.
    Consider this

    Use the scatter plots to make predictions about statistical relationships. Consider whether any of these plots look like they are visualizing strong relationships.

    Argue whether you think larger data sets make it easier or harder to identify distinct relationships “by eye”.

    Did it!

    [Your answer here]

    Let’s determine whether these variables are correlated using the Pearson correlation coefficients. We can use Hmisc::rcorr to calculate all pairwise correlations and test whether the relationships are significant.

    P_corr <- rcorr(as.matrix(df), type="pearson")

    The output is a list. Recall that we can access individual components of a list using $, similar the way we access the individual vectors comprising a data.frame. In this case P_corr$r contains the correlation coefficients, and P_corr$p contains the p-values.

    Visualizing correlations in a correlation plot can be helpful for a quick overview when you are working with a lot of different values, we can do this using the function corrplot().

    corrplot(P_corr$r)

    Figure 34.2: Pairwise relationships of all topographical and vegetation variables measured using Pearson’s correlation coefficient.

    Alternatively, you can print the correlation coefficients as a matrix.

    P_corr$r
                veg_ht        ndvi         dtm        slope        north
    veg_ht  1.00000000  0.15271745 -0.10739489  0.054293139  0.049763864
    ndvi    0.15271745  1.00000000 -0.09762246  0.204216049  0.031199918
    dtm    -0.10739489 -0.09762246  1.00000000  0.038739447  0.086963361
    slope   0.05429314  0.20421605  0.03873945  1.000000000 -0.005534425
    north   0.04976386  0.03119992  0.08696336 -0.005534425  1.000000000
    Consider this

    Describe whether or not these relationships conform to your predictions when you looked at the maps and when you looked at the scatter plots and discuss what could be causing differences from your expectations.

    Did it!

    [Your answer here]

    Let’s also print the corresponding p-values:

    P_corr$P
           veg_ht ndvi dtm            slope            north
    veg_ht     NA    0   0 0.00000000000000 0.00000000000000
    ndvi        0   NA   0 0.00000000000000 0.00000000000000
    dtm         0    0  NA 0.00000000000000 0.00000000000000
    slope       0    0   0               NA 0.00000003325176
    north       0    0   0 0.00000003325176               NA
    Consider this

    Assess whether or not any of your relationships are statistically significant. Discuss the importance of p-values for large data sets.

    Did it!

    [Your answer here]

    Let’s coerce our correlation coefficients into something a bit more tidy. while we’re at it we can also add the abbreviation for the site we’ve been looking at. Let’s write that data frame out as a text file in our results folder so we can access our results down the line.

    tidy_cor <- as.data.frame(P_corr$r) %>%
      rownames_to_column("Param1") %>%
      pivot_longer(names_to = "Param2", values_to = "pearson", 2:6) %>%
      filter(Param1 %in% c("veg_ht", "ndvi")) %>%
      filter(!Param2 %in% c("veg_ht", "ndvi")) %>%
      mutate(Site = "SOAP")
    
    write_delim(tidy_cor, "results/SOAP_correlation.txt", delim = "\t")

    34.4 Acknowledgments

    These activities are based on the EDDIE Remote Sensing module.5

  • 5 Dahlin, K. M. (2021). Remote Sensing of Plants and Topography in R (Project EDDIE).